feat(server): stream monolithic blob PUT straight to object store#7
Open
tonicmuroq wants to merge 1 commit into
Open
feat(server): stream monolithic blob PUT straight to object store#7tonicmuroq wants to merge 1 commit into
tonicmuroq wants to merge 1 commit into
Conversation
persistMonolithicUpload spooled the entire blob to a disk tempfile (via the chunked-upload session machinery), hashed it, then read it back and uploaded to the object store — two full passes plus ~2x disk I/O, and receive/upload run serially. For multi-GiB VM disk/memory blobs that's the bulk of a push (single PUTs were taking 2-4 min). Monolithic PUT already knows the digest up front (it's in the URL), so there's no need to buffer: stream the request body through a sha256 hasher straight into a concurrent multipart upload (no disk), verify the digest once the stream drains, and delete on mismatch so the content-addressed key never keeps unverified bytes. Server-side digest verification is preserved; no client change. The chunked PATCH path (no first-party client uses it) still spools to disk, since its digest is only known at finalize. GCS S3-compat streaming multipart validated against staging: 12 MiB in 3 concurrent 5 MiB parts, sha256 round-trip verified.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
persistMonolithicUpload(thePUT /v2/<name>/blobs/<digest>path — the only push path our clients use) spooled the entire blob to a disk tempfile via the chunked-upload session machinery, hashed it, then read it back and uploaded to the object store. So a push was:Two full passes, ~2x disk I/O, and receive and upload run serially. For multi-GiB VM disk/memory blobs this was the bulk of push time (single PUTs were taking 2–4 min).
Change
A monolithic PUT already knows the digest up front (it's in the URL), so there's no reason to buffer. Stream the request body through a sha256 hasher straight into a concurrent multipart upload (
minioConcurrentStreamParts, bounded atPartSize*NumThreads= 64 MiB × 4), verify the digest once the stream drains, delete on mismatch.DIGEST_INVALID.Measured live (deployed to cocoonstack-us, image
redirect-stream-20260618)Server-side PUT durations (pure upload-through-epoch — most accurate):
win10-20260618-2(~10.4 GB, 5 layers) — big layer A4.0K) sampled 3× during a multi-GB PUT → bytes stream straight through, nothing buffered whole. This is the core fix.cocoon snapshot export+ per-blob sha256 buffering, not just the network upload.)Remaining ceiling: bytes still transit epoch (TLS in + multipart out on ~3 CPU), so big layers cap ~80–110 MB/s. GCS-direct push (presigned PUT) is a possible follow-up.
Relation to #6
Independent of #6 (blob GET redirect). Together: pull bypasses the proxy entirely (#6), push stops double-buffering and parallelizes the GCS leg (this). Fully removing epoch from the push data path (presigned PUT direct to GCS, first-party-trust + GCS-CRC32C) is a possible follow-up.